Dactylopterus volitans

O estudo de diversas espécies pode ser feito por meio de repositórios, pacotes e funções na linguagem R. Vamos testar?

Para começar precisamos escolher a espécie que vamos analisar. Que tal a Dactylopterus volitans, conhecido como peixe-voador? O nome popular peixe-voador se deve a suas nadadeiras parecerem asas quando abertas. O peixe-voador é uma espécie marinha e estuariana associada a recifes e sua distibuição se estende por todo o Atlântico, do Canadá a Argentina, e do outro lado do Atlântico, se estende pela costa do continente Africano e no Mediterrâneo. Caso você queira mais informações sobre o peixe-voador acesse o FishBase.

Dactylopterus volitans - FishBase

Acesso e limpeza de dados pelo GBIF e OBIS

GBIF

Primeiro, vamos extrair os dados dos repositórios GBIF e OBIS com os pacotes rgbif e robis. Depois vamos usar o pacote tidyverse, para carregar os demais pacotes, como o dplyr, para manipularmos dados, e o ggplot, para visualizarmos dados. Agora vamos carregar o pacote rgbif pois ele permite o acesso aos dados do GBIF. O pacote rgbif possui a função occ_data, função essa que nos permite buscar dados das espécies pelo nome científico, numeração ou local. O nosso script estará parecido com esse abaixo:

library(tidyverse)
library(rgbif)
flying_gbif <- occ_data(scientificName = "Dactylopterus volitans", 
                        hasCoordinate = TRUE,
                        hasGeospatialIssue=FALSE)

Agora nós vamos começar a limpar os dados com as funções issue e strsplit, a primeira função nos permite identificar os possíveis erros e a segunda nos permite individualizar variáveis duplicadas. Os comandos e output no script ficarão como abaixo, em que o carácter $ nos indica que estamos a acessar os data frames baixados do GBIF. Além disso, vamos utilizar a função dim para checar as variáveis, essa função nos permite checar a dimensão dos dados retornados pelo repositório.

# dimensoes
dim(flying_gbif)
## NULL
dim(flying_gbif$data)
## [1] 500 152
# checar campos
flying_gbif$data %>% names
##   [1] "key"                              "scientificName"                  
##   [3] "decimalLatitude"                  "decimalLongitude"                
##   [5] "issues"                           "datasetKey"                      
##   [7] "publishingOrgKey"                 "installationKey"                 
##   [9] "publishingCountry"                "protocol"                        
##  [11] "lastCrawled"                      "lastParsed"                      
##  [13] "crawlId"                          "hostingOrganizationKey"          
##  [15] "basisOfRecord"                    "occurrenceStatus"                
##  [17] "lifeStage"                        "taxonKey"                        
##  [19] "kingdomKey"                       "phylumKey"                       
##  [21] "classKey"                         "orderKey"                        
##  [23] "familyKey"                        "genusKey"                        
##  [25] "speciesKey"                       "acceptedTaxonKey"                
##  [27] "acceptedScientificName"           "kingdom"                         
##  [29] "phylum"                           "order"                           
##  [31] "family"                           "genus"                           
##  [33] "species"                          "genericName"                     
##  [35] "specificEpithet"                  "taxonRank"                       
##  [37] "taxonomicStatus"                  "iucnRedListCategory"             
##  [39] "dateIdentified"                   "coordinateUncertaintyInMeters"   
##  [41] "stateProvince"                    "year"                            
##  [43] "month"                            "day"                             
##  [45] "eventDate"                        "modified"                        
##  [47] "lastInterpreted"                  "references"                      
##  [49] "license"                          "isInCluster"                     
##  [51] "datasetName"                      "recordedBy"                      
##  [53] "identifiedBy"                     "inCluster"                       
##  [55] "geodeticDatum"                    "class"                           
##  [57] "countryCode"                      "country"                         
##  [59] "rightsHolder"                     "identifier"                      
##  [61] "http://unknown.org/nick"          "verbatimEventDate"               
##  [63] "gbifID"                           "verbatimLocality"                
##  [65] "collectionCode"                   "occurrenceID"                    
##  [67] "taxonID"                          "catalogNumber"                   
##  [69] "institutionCode"                  "eventTime"                       
##  [71] "occurrenceRemarks"                "http://unknown.org/captive"      
##  [73] "identificationID"                 "identificationRemarks"           
##  [75] "depth"                            "depthAccuracy"                   
##  [77] "http://unknown.org/language"      "locality"                        
##  [79] "http://unknown.org/rights"        "taxonConceptID"                  
##  [81] "http://unknown.org/rightsHolder"  "associatedSequences"             
##  [83] "datasetID"                        "eventID"                         
##  [85] "footprintWKT"                     "county"                          
##  [87] "originalNameUsage"                "identificationVerificationStatus"
##  [89] "nameAccordingTo"                  "networkKeys"                     
##  [91] "individualCount"                  "elevation"                       
##  [93] "elevationAccuracy"                "recordNumber"                    
##  [95] "municipality"                     "language"                        
##  [97] "type"                             "ownerInstitutionCode"            
##  [99] "continent"                        "waterBody"                       
## [101] "habitat"                          "institutionID"                   
## [103] "parentEventID"                    "footprintSRS"                    
## [105] "sex"                              "establishmentMeans"              
## [107] "organismQuantity"                 "organismQuantityType"            
## [109] "institutionKey"                   "collectionKey"                   
## [111] "preparations"                     "samplingProtocol"                
## [113] "nomenclaturalCode"                "higherGeography"                 
## [115] "georeferenceVerificationStatus"   "endDayOfYear"                    
## [117] "fieldNumber"                      "verbatimDepth"                   
## [119] "locationRemarks"                  "startDayOfYear"                  
## [121] "accessRights"                     "bibliographicCitation"           
## [123] "higherClassification"             "collectionID"                    
## [125] "rights"                           "georeferenceSources"             
## [127] "projectId"                        "islandGroup"                     
## [129] "samplingEffort"                   "locationAccordingTo"             
## [131] "higherGeographyID"                "georeferencedDate"               
## [133] "georeferencedBy"                  "island"                          
## [135] "georeferenceProtocol"             "verbatimSRS"                     
## [137] "verbatimCoordinateSystem"         "georeferenceRemarks"             
## [139] "locationID"                       "taxonRemarks"                    
## [141] "disposition"                      "vernacularName"                  
## [143] "fieldNotes"                       "organismName"                    
## [145] "materialSampleID"                 "dynamicProperties"               
## [147] "namePublishedInYear"              "acceptedNameUsage"               
## [149] "parentNameUsage"                  "otherCatalogNumbers"             
## [151] "eventRemarks"                     "name"
#problemas indentificados pelo repositório
gbif_issues()
##           code                                             issue
## 1          bri                           BASIS_OF_RECORD_INVALID
## 2          ccm                        CONTINENT_COUNTRY_MISMATCH
## 3          cdc                CONTINENT_DERIVED_FROM_COORDINATES
## 4        conti                                 CONTINENT_INVALID
## 5         cdiv                                COORDINATE_INVALID
## 6        cdout                           COORDINATE_OUT_OF_RANGE
## 7        cdrep                            COORDINATE_REPROJECTED
## 8       cdrepf                    COORDINATE_REPROJECTION_FAILED
## 9       cdreps                COORDINATE_REPROJECTION_SUSPICIOUS
## 10     cdround                                COORDINATE_ROUNDED
## 11     cucdmis                       COUNTRY_COORDINATE_MISMATCH
## 12        cudc                  COUNTRY_DERIVED_FROM_COORDINATES
## 13        cuiv                                   COUNTRY_INVALID
## 14         cum                                  COUNTRY_MISMATCH
## 15      depmms                             DEPTH_MIN_MAX_SWAPPED
## 16       depnn                                 DEPTH_NON_NUMERIC
## 17     depnmet                                  DEPTH_NOT_METRIC
## 18      depunl                                    DEPTH_UNLIKELY
## 19       elmms                         ELEVATION_MIN_MAX_SWAPPED
## 20        elnn                             ELEVATION_NON_NUMERIC
## 21      elnmet                              ELEVATION_NOT_METRIC
## 22       elunl                                ELEVATION_UNLIKELY
## 23      gass84                      GEODETIC_DATUM_ASSUMED_WGS84
## 24      gdativ                            GEODETIC_DATUM_INVALID
## 25     iddativ                           IDENTIFIED_DATE_INVALID
## 26    iddatunl                          IDENTIFIED_DATE_UNLIKELY
## 27      mdativ                             MODIFIED_DATE_INVALID
## 28     mdatunl                            MODIFIED_DATE_UNLIKELY
## 29    muldativ                           MULTIMEDIA_DATE_INVALID
## 30    muluriiv                            MULTIMEDIA_URI_INVALID
## 31   preneglat                         PRESUMED_NEGATED_LATITUDE
## 32   preneglon                        PRESUMED_NEGATED_LONGITUDE
## 33     preswcd                       PRESUMED_SWAPPED_COORDINATE
## 34      rdativ                             RECORDED_DATE_INVALID
## 35       rdatm                            RECORDED_DATE_MISMATCH
## 36     rdatunl                            RECORDED_DATE_UNLIKELY
## 37    refuriiv                            REFERENCES_URI_INVALID
## 38    txmatfuz                                 TAXON_MATCH_FUZZY
## 39     txmathi                            TAXON_MATCH_HIGHERRANK
## 40    txmatnon                                  TAXON_MATCH_NONE
## 41   typstativ                               TYPE_STATUS_INVALID
## 42      zerocd                                   ZERO_COORDINATE
## 43        cdpi                      COORDINATE_PRECISION_INVALID
## 44       cdumi             COORDINATE_UNCERTAINTY_METERS_INVALID
## 45       indci                          INDIVIDUAL_COUNT_INVALID
## 46      interr                              INTERPRETATION_ERROR
## 47       iccos INDIVIDUAL_COUNT_CONFLICTS_WITH_OCCURRENCE_STATUS
## 48       osiic  OCCURRENCE_STATUS_INFERRED_FROM_INDIVIDUAL_COUNT
## 49         osu                      OCCURRENCE_STATUS_UNPARSABLE
## 50       geodi                        GEOREFERENCED_DATE_INVALID
## 51       geodu                       GEOREFERENCED_DATE_UNLIKELY
## 52      ambcol                              AMBIGUOUS_COLLECTION
## 53     ambinst                             AMBIGUOUS_INSTITUTION
## 54     colmafu                            COLLECTION_MATCH_FUZZY
## 55     colmano                             COLLECTION_MATCH_NONE
## 56     incomis                   INSTITUTION_COLLECTION_MISMATCH
## 57      inmafu                           INSTITUTION_MATCH_FUZZY
## 58      inmano                            INSTITUTION_MATCH_NONE
## 59     osifbor   OCCURRENCE_STATUS_INFERRED_FROM_BASIS_OF_RECORD
## 60     diffown                       DIFFERENT_OWNER_INSTITUTION
## 61   taxmatagg                             TAXON_MATCH_AGGREGATE
## 62    fpsrsinv                             FOOTPRINT_SRS_INVALID
## 63    fpwktinv                             FOOTPRINT_WKT_INVALID
## 64         anm                             ACCEPTED_NAME_MISSING
## 65        annu                          ACCEPTED_NAME_NOT_UNIQUE
## 66      anuidi                    ACCEPTED_NAME_USAGE_ID_INVALID
## 67    aitidinv                            ALT_IDENTIFIER_INVALID
## 68        bbmn                               BACKBONE_MATCH_NONE
## 69    basauthm                          BASIONYM_AUTHOR_MISMATCH
## 70     bibrinv                             BIB_REFERENCE_INVALID
## 71       chsun                                    CHAINED_SYNOYM
## 72      clasna                        CLASSIFICATION_NOT_APPLIED
## 73     clasroi                 CLASSIFICATION_RANK_ORDER_INVALID
## 74  conbascomb                  CONFLICTING_BASIONYM_COMBINATION
## 75      desinv                               DESCRIPTION_INVALID
## 76      disinv                              DISTRIBUTION_INVALID
## 77         hom                                           HOMONYM
## 78        minv                                MULTIMEDIA_INVALID
## 79         npm                              NAME_PARENT_MISMATCH
## 80          ns                                        NO_SPECIES
## 81       nsinv                      NOMENCLATURAL_STATUS_INVALID
## 82       onder                             ORIGINAL_NAME_DERIVED
## 83        onnu                          ORIGINAL_NAME_NOT_UNIQUE
## 84    onuidinv                    ORIGINAL_NAME_USAGE_ID_INVALID
## 85          ov                              ORTHOGRAPHIC_VARIANT
## 86          pc                                      PARENT_CYCLE
## 87        pnnu                            PARENT_NAME_NOT_UNIQUE
## 88    pnuidinv                      PARENT_NAME_USAGE_ID_INVALID
## 89          pp                                PARTIALLY_PARSABLE
## 90         pbg                            PUBLISHED_BEFORE_GENUS
## 91     rankinv                                      RANK_INVALID
## 92     relmiss                              RELATIONSHIP_MISSING
## 93       scina                         SCIENTIFIC_NAME_ASSEMBLED
## 94     spprinv                           SPECIES_PROFILE_INVALID
## 95    taxstinv                          TAXONOMIC_STATUS_INVALID
## 96    taxstmis                         TAXONOMIC_STATUS_MISMATCH
## 97      unpars                                        UNPARSABLE
## 98 vernnameinv                           VERNACULAR_NAME_INVALID
## 99  backmatagg                          BACKBONE_MATCH_AGGREGATE
##                                                                                                                                                              description
## 1                                                           The given basis of record is impossible to interpret or seriously different from the recommended vocabulary.
## 2                                                                                                                 The interpreted continent and country do not match up.
## 3                                                                            The interpreted continent is based on the coordinates, not the verbatim string information.
## 4                                                                                                                                Uninterpretable continent values found.
## 5                                                                                                Coordinate value given in some form but GBIF is unable to interpret it.
## 6                                                                                                  Coordinate has invalid lat/lon values out of their decimal max range.
## 7                                                                         The original coordinate was successfully reprojected from a different geodetic datum to WGS84.
## 8                                                                The given decimal latitude and longitude could not be reprojected to WGS84 based on the provided datum.
## 9                          Indicates successful coordinate reprojection according to provided datum, but which results in a datum shift larger than 0.1 decimal degrees.
## 10                                                                                                               Original coordinate modified by rounding to 5 decimals.
## 11                                                                                         The interpreted occurrence coordinates fall outside of the indicated country.
## 12                                                                             The interpreted country is based on the coordinates, not the verbatim string information.
## 13                                                                                                                                 Uninterpretable country values found.
## 14                                                                                        Interpreted country for dwc:country and dwc:countryCode contradict each other.
## 15                                                                                                                                               Set if supplied min>max
## 16                                                                                                                                   Set if depth is a non numeric value
## 17                                                                     Set if supplied depth is not given in the metric system, for example using feet instead of meters
## 18                                                                                                                      Set if depth is larger than 11.000m or negative.
## 19                                                                                                                                   Set if supplied min > max elevation
## 20                                                                                                                               Set if elevation is a non numeric value
## 21                                                                 Set if supplied elevation is not given in the metric system, for example using feet instead of meters
## 22                                                                                      Set if elevation is above the troposphere (17km) or below 11km (Mariana Trench).
## 23                              Indicating that the interpreted coordinates assume they are based on WGS84 datum as the datum was either not indicated or interpretable.
## 24                                                                                                                    The geodetic datum given could not be interpreted.
## 25                                                                                      The date given for dwc:dateIdentified is invalid and cant be interpreted at all.
## 26                                                                                The date given for dwc:dateIdentified is in the future or before Linnean times (1700).
## 27                                                              A (partial) invalid date is given for dc:modified, such as a non existing date, invalid zero month, etc.
## 28                                                                                         The date given for dc:modified is in the future or predates unix time (1970).
## 29                                                                                                       An invalid date is given for dc:created of a multimedia object.
## 30                                                                                                                      An invalid uri is given for a multimedia object.
## 31                                                                                                            Latitude appears to be negated, e.g. 32.3 instead of -32.3
## 32                                                                                                           Longitude appears to be negated, e.g. 32.3 instead of -32.3
## 33                                                                                                                          Latitude and longitude appear to be swapped.
## 34                                                                              A (partial) invalid date is given, such as a non existing date, invalid zero month, etc.
## 35                                                           The recording date specified as the eventDate string and the individual year, month, day are contradicting.
## 36                        The recording date is highly unlikely, falling either into the future or represents a very old date before 1600 that predates modern taxonomy.
## 37                                                                                                                            An invalid uri is given for dc:references.
## 38                                                                                   Matching to the taxonomic backbone can only be done using a fuzzy, non exact match.
## 39                                                                     Matching to the taxonomic backbone can only be done on a higher rank and not the scientific name.
## 40       Matching to the taxonomic backbone cannot be done cause there was no match at all or several matches with too little information to keep them apart (homonyms).
## 41                                                              The given type status is impossible to interpret or seriously different from the recommended vocabulary.
## 42                                                                                       Coordinate is the exact 0/0 coordinate, often indicating a bad null coordinate.
## 43                                                                                                             Indicates an invalid or very unlikely coordinatePrecision
## 44                                                                                                        Indicates an invalid or very unlikely dwc:uncertaintyInMeters.
## 45                                                                                                                  Individual count value not parsable into an integer.
## 46                                                                                An error occurred during interpretation, leaving the record interpretation incomplete.
## 47                                                                                         Example: individual count value > 0, but occurrence status is absent and etc.
## 48                                                                                                        Occurrence status was inferred from the individual count value
## 49                                                                                                         Occurrence status value can't be assigned to OccurrenceStatus
## 50                                                                                  The date given for dwc:georeferencedDate is invalid and can't be interpreted at all.
## 51                                                                             The date given for dwc:georeferencedDate is in the future or before Linnean times (1700).
## 52                                                                                                   The given collection matches with more than 1 GrSciColl collection.
## 53                                                                                                 The given institution matches with more than 1 GrSciColl institution.
## 54                                                                                                   The given collection was fuzzily matched to a GrSciColl collection.
## 55                                                                                               The given collection couldn't be matched with any GrSciColl collection.
## 56                                                                                                     The collection matched doesn't belong to the institution matched.
## 57                                                                                                 The given institution was fuzzily matched to a GrSciColl institution.
## 58                                                                                             The given institution couldn't be matched with any GrSciColl institution.
## 59                                                                                                                  Occurrence status was inferred from basis of records
## 60 The given owner institution is different than the given institution. Therefore we assume it doesn't belong to the institution and we don't link it to the occurrence.
## 61                Matching to the taxonomic backbone can only be done on a species level, but the occurrence was in fact considered a broader species aggregate/complex.
## 62                                                                                                 The Footprint Spatial Reference System given could not be interpreted
## 63                                                                                                          The Footprint Well-Known-Text given could not be interpreted
## 64                                                                                                                                     Synonym lacking an accepted name.
## 65                                                                               Synonym has a verbatim accepted name which is not unique and refers to several records.
## 66                                                                                                          The value for dwc:acceptedNameUsageID could not be resolved.
## 67                                                                          At least one alternative identifier extension record attached to this name usage is invalid.
## 68                                                                                                                 Name usage could not be matched to the GBIF backbone.
## 69                                                                     The authorship of the original name does not match the authorship in brackets of the actual name.
## 70                                                                         At least one bibliographic reference extension record attached to this name usage is invalid.
## 71                                                                                   If a synonym points to another synonym as its accepted taxon the chain is resolved.
## 72                                                                                               The denormalized classification could not be applied to the name usage.
## 73                                                                    The given ranks of the names in the classification hierarchy do not follow the hierarchy of ranks.
## 74                                                                                 There have been more than one accepted name in a homotypical basionym group of names.
## 75                                                                                     At least one description extension record attached to this name usage is invalid.
## 76                                                                                    At least one distribution extension record attached to this name usage is invalid.
## 77                                                   A not synonymized homonym exists for this name in some other backbone source which have been ignored at build time.
## 78                                                                                      At least one multimedia extension record attached to this name usage is invalid.
## 79                                               The (accepted) bi/trinomial name does not match the parent name and should be recombined into the parent genus/species.
## 80                                                           The group (currently only genera are tested) are lacking any accepted species GBIF backbone specific issue.
## 81                                                                                                                      dwc:nomenclaturalStatus could not be interpreted
## 82                     Record has a original name (basionym) relationship which was derived from name & authorship comparison, but did not exist explicitly in the data.
## 83                                                                     Record has a verbatim original name (basionym) which is not unique and refers to several records.
## 84                                                                                                          The value for dwc:originalNameUsageID could not be resolved.
## 85                                                                                                              A potential orthographic variant exists in the backbone.
## 86                                                                                 The child parent classification resulted into a cycle that needed to be resolved/cut.
## 87                                                                                  Record has a verbatim parent name which is not unique and refers to several records.
## 88                                                                                                            The value for dwc:parentNameUsageID could not be resolved.
## 89                                    The beginning of the scientific name string was parsed, but there is additional information in the string that was not understood.
## 90                                                                                            A bi/trinomial name published earlier than the parent genus was published.
## 91                                                                                                                                dwc:taxonRank could not be interpreted
## 92                                                                                                   There were problems representing all name usage relationships, i.e.
## 93                                                                     The scientific name was assembled from the individual name parts and not given as a whole string.
## 94                                                                                 At least one species profile extension record attached to this name usage is invalid.
## 95                                                                                                                          dwc:taxonomicStatus could not be interpreted
## 96                                                                                                                                                        no description
## 97                                                                   The scientific name string could not be parsed at all, but appears to be a parsable name type, i.e.
## 98                                                                                 At least one vernacular name extension record attached to this name usage is invalid.
## 99                                                     Name usage could only be matched to a GBIF backbone species, but was in fact a broader species aggregate/complex.
##          type
## 1  occurrence
## 2  occurrence
## 3  occurrence
## 4  occurrence
## 5  occurrence
## 6  occurrence
## 7  occurrence
## 8  occurrence
## 9  occurrence
## 10 occurrence
## 11 occurrence
## 12 occurrence
## 13 occurrence
## 14 occurrence
## 15 occurrence
## 16 occurrence
## 17 occurrence
## 18 occurrence
## 19 occurrence
## 20 occurrence
## 21 occurrence
## 22 occurrence
## 23 occurrence
## 24 occurrence
## 25 occurrence
## 26 occurrence
## 27 occurrence
## 28 occurrence
## 29 occurrence
## 30 occurrence
## 31 occurrence
## 32 occurrence
## 33 occurrence
## 34 occurrence
## 35 occurrence
## 36 occurrence
## 37 occurrence
## 38 occurrence
## 39 occurrence
## 40 occurrence
## 41 occurrence
## 42 occurrence
## 43 occurrence
## 44 occurrence
## 45 occurrence
## 46 occurrence
## 47 occurrence
## 48 occurrence
## 49 occurrence
## 50 occurrence
## 51 occurrence
## 52 occurrence
## 53 occurrence
## 54 occurrence
## 55 occurrence
## 56 occurrence
## 57 occurrence
## 58 occurrence
## 59 occurrence
## 60 occurrence
## 61 occurrence
## 62 occurrence
## 63 occurrence
## 64       name
## 65       name
## 66       name
## 67       name
## 68       name
## 69       name
## 70       name
## 71       name
## 72       name
## 73       name
## 74       name
## 75       name
## 76       name
## 77       name
## 78       name
## 79       name
## 80       name
## 81       name
## 82       name
## 83       name
## 84       name
## 85       name
## 86       name
## 87       name
## 88       name
## 89       name
## 90       name
## 91       name
## 92       name
## 93       name
## 94       name
## 95       name
## 96       name
## 97       name
## 98       name
## 99       name
#checagem dos problemas encontrados na base
flying_gbif$data$issues %>% 
  unique() %>% 
  strsplit(., "[,]") %>% 
  unlist()
##  [1] "cdround"   "cdround"   "cudc"      "gass84"    "cum"       "gass84"   
##  [7] "fpwktinv"  "gass84"    "osiic"     "diffown"   "gass84"    "gdativ"   
## [13] "colmafu"   "inmafu"    "cdreps"    "cdround"   "inmano"    "conti"    
## [19] "gass84"    "cdround"   "osiic"     "cdround"   "depmms"    "typstativ"
## [25] "osiic"     "inmano"    "cdround"   "gass84"    "cudc"      "gass84"   
## [31] "cdround"   "depmms"    "elmms"     "typstativ" "osiic"     "inmano"   
## [37] "gass84"    "gdativ"    "refuriiv"

O conjunto de dados retornado pelo GBIF apresenta diversas variáveis, mas utilizaremos apenas 14 para começarmos a nossa análise. Assim, selecionaremos por meio do pacote dplyr as variáveis de interesse e geraremos um data frame com 500 observações, em que cada observação corresponde a uma ocorrência da espécie. Agora, utilizaremos a funçãodistinct para verificar as observações, o que reduziu o nosso data frame para 461 observações. Então, utilizaremos a função unique para remover as observações duplicadas, como no script e no output abaixo:

#seleção de variáveis
flying_gbif1 <- flying_gbif$data %>%
  dplyr::select(scientificName, acceptedScientificName, decimalLatitude, decimalLongitude,
                issues, waterBody, basisOfRecord, occurrenceStatus, rightsHolder, 
                datasetName, recordedBy, depth, locality, habitat, year) 
#ocorrências unicas
flying_gbif1 <- flying_gbif1 %>% 
  distinct() 
# checar niveis dos fatores
lapply(flying_gbif1, unique)
## $scientificName
## [1] "Dactylopterus volitans (Linnaeus, 1758)"
## [2] "BOLD:AAB9833"                           
## 
## $acceptedScientificName
## [1] "Dactylopterus volitans (Linnaeus, 1758)"
## [2] "BOLD:AAB9833"                           
## 
## $decimalLatitude
##   [1] -22.938775 -22.994415  37.084909  18.472243 -23.014619  21.144459
##   [7]  12.604672  39.360083  11.268465  39.448878  39.448931  14.029178
##  [13]  39.088752  39.449634  42.368025  39.448435  39.447788  12.521525
##  [19]  36.720843  41.421000  18.428530  40.052190  38.091298 -34.684100
##  [25]  40.193427 -38.551050  38.731004  17.634067  17.633238  41.387356
##  [31]  44.657141  17.620576   9.636069  34.617847  21.290365  41.916332
##  [37]  12.133924 -22.990787  22.120637  19.351058  12.518484  42.365361
##  [43]  43.680367  43.687389  40.408868  39.132195  12.521620  26.782984
##  [49]  35.934040  42.538126  42.536637  42.536417  11.154870  36.996504
##  [55]  45.528336  44.632480  43.318600  43.320000 -23.033885  20.485278
##  [61]  17.891213  39.151353   9.255185  26.782915  43.781012  18.708028
##  [67]  41.916813  43.229036  15.773260  40.377449  20.472701  39.080793
##  [73]  38.624125  42.214137  19.271623  12.110460  12.143384  26.783231
##  [79]  12.117240  26.783365  14.729990  12.121913  12.023785 -17.828611
##  [85] -17.759444 -23.124924  11.219170  11.273350  41.916694  41.916774
##  [91]  41.916642  39.536000  39.466940  12.318043  38.619478  39.202068
##  [97]  11.207970  41.862614  26.783003  28.443300  20.484722  12.555902
## [103]   4.219720  12.566334 -23.109408 -23.169070  17.482080 -17.751944
## [109]  40.361210  18.341895  43.780304  17.914532  18.216815  14.756833
## [115]  14.655996   0.403352   0.403318  12.008061  18.218325  18.219410
## [121]  36.020300  15.793972  36.738936  13.067341  13.069592 -22.956500
## [127] -23.005735  14.443092  39.791512  15.554900  24.041212 -19.749167
## [133]  12.318116  12.021200  40.881574  47.522683  12.536642  16.234670
## [139]  10.309271  24.030160  20.809369  16.098915 -23.971500  39.529000
## [145] -13.008600  14.785620  36.018898  36.018153  38.500000  38.510000
## [151]  38.620000  39.486000  38.440000  41.917222  36.019900  39.833052
## [157]  31.518349  38.560000  15.277094   9.499519  35.204700  24.068598
## [163]  12.467493 -23.148705  16.308940  41.570700  20.866853  10.220227
## [169]  24.661640  24.661470  24.722690  24.722610  24.723970  24.723890
## [175]  24.578580  24.578540  24.582760  24.582630  24.584000  24.583940
## [181]  24.591300  24.718970  24.719050  24.591220  24.723940  24.722510
## [187]  24.722460  24.724440  24.722470  24.722540  24.724310  24.731380
## [193]  24.724370  24.724070  24.732960  24.724260  24.732870  24.594170
## [199]  24.597540  24.597590  24.594210  24.598170  24.598120  24.712570
## [205]  24.618170  24.712490  24.626950  24.626910  24.634810  24.634750
## [211]  24.719440  24.623900  24.720970  24.623970  24.721170  24.722440
## [217]  24.722450  24.573360  24.573420  24.594670  24.594780  24.592850
## [223]  24.592810  24.595440  24.587400  24.593090  24.598590  24.589100
## [229]  24.589130  24.595550  24.587450  24.678200  24.667080  24.666920
## [235]  24.661570  24.678280  24.661220  24.581490  24.581520  24.581100
## [241]  24.592630  24.592550  24.581030  24.723150  24.671770  24.675300
## [247]  24.675260  24.712840  24.712940  24.570740  24.570700  24.629140
## [253]  24.629170  24.658610  24.658620  24.660400  24.660540  24.666410
## [259]  24.666250  24.669310  24.669070  24.670300  24.670440  24.690700
## [265]  24.690880  24.715390  24.715450  24.624960  24.625060  24.628800
## [271]  24.628720  24.630980  24.705400  24.705470  24.712410  24.712510
## [277]  24.720680  24.720660  24.726650  24.726690  24.726240  24.726340
## [283]  24.728450  24.728420  24.726830  24.729810  24.729870  24.634340
## [289]  24.637600  24.634450  24.662500  24.662540  24.663940  24.664000
## [295]  24.688820  24.683420  24.683300  24.688760  24.626220  24.633200
## [301]  24.633290  24.626150  24.639760  24.639840  24.644540  24.626640
## [307]  24.626690  24.644550  24.680100  24.718360  24.688360  24.680130
## [313]  24.688370  24.718470  24.613550  24.641820  24.613470  24.641710
## [319]  24.611070  24.610980  24.715680  24.715730  24.717880  24.717810
## [325]  24.602050  24.602060  24.623240  24.623180  24.632980  24.632840
## [331]  24.640620  24.640760  24.686800  24.686910  24.641870  24.642780
## [337]  24.642750  24.643400  24.643490  24.618770  24.709050  24.709010
## [343]  24.618660  24.629600  24.629660  24.635120  24.635170  24.640250
## [349]  24.640310  24.648390  24.648450  24.645620  24.645500  24.616960
## [355]  24.615700  24.615640  24.616910  24.620240  24.620250  24.614300
## [361]  24.614500  24.655710  24.655750  24.618270  24.618400  24.632160
## [367]  24.632360  24.634960  24.636300  24.646890  24.646780  24.679500
## [373]  24.679680  24.707350  24.707420  24.677440  24.677420  24.681640
## [379]  24.681580  24.681970  24.682920  24.682040  24.682810  24.682390
## [385]  24.682350  24.686110  24.715900  24.715990  24.680310  24.680210
## [391]  24.681030  24.681100  24.713740  24.719590  24.713700  24.719490
## [397]  24.694820  24.694880  24.721320  24.721030  24.685960  24.699120
## [403]  24.652580  24.661020  24.684940  24.700840  24.723200  24.733070
## [409]  24.643610  24.629000  24.642850  24.638950  24.711350  24.711770
## [415]  24.653780  24.653840  24.622320  24.713300  24.651740  24.684950
## [421]  24.697700  24.668950  24.663050  24.663220  24.720710  24.651110
## [427]  24.686030  24.708100  24.661130  24.718390  24.587540  24.616750
## [433]  24.677060  24.638870  24.711190  24.683890  24.645440
## 
## $decimalLongitude
##   [1] -42.288895 -43.176417   6.462393 -67.168947 -44.279129 -86.793512
##   [7] -70.052447   3.225275 -74.195602   2.742134   2.742160 -16.771705
##  [13]  23.698859   2.740545  10.877323   2.741765   2.742548 -81.741386
##  [19]  -3.728723   9.106261 -77.153114   4.053591  20.569633 -54.278187
##  [25]  23.684955 -58.565856   1.392308 -63.263442 -63.259275  12.909748
##  [31] -63.944251 -63.264091 -82.660371  33.005539 -89.666009   3.209454
##  [37] -68.985271 -43.165480 -81.119334 -81.270147 -81.729409  18.662894
##  [43]   7.237969   7.246095  17.209440   9.472741 -69.968338 -80.044254
##  [49]  14.344284   3.059085   3.058447   3.058573 -74.227787   7.183845
##  [55] -60.973655 -63.934948  -8.763800  -8.760000 -43.202450 -86.972449
##  [61] -62.848508  23.323827 -82.142522 -80.044685   7.648342 -87.707885
##  [67]   3.208984   5.347821 -87.465350  17.297248 -86.993588  17.127110
##  [73]  20.604304   3.125282 -81.313130 -68.983968 -69.026961 -80.044444
##  [79] -68.966977 -80.040597 -61.180650 -68.969679 -61.764437 -39.200278
##  [85] -39.181667 -44.278558 -74.240580 -74.206970   3.208550   3.208636
##  [91]   3.208708   2.376600  26.208938 -69.151315  15.826300  23.338677
##  [97] -74.239700   3.166441 -80.044070 -84.516100 -86.951005 -81.716394
## [103] 118.695000 -81.690855 -44.230542 -44.144081 -62.990555 -39.181389
## [109]  17.312072 -64.978137   7.652152 -71.672745 -70.562675 -60.965848
## [115] -61.154659   6.690227   6.690225 -61.794923 -70.560572 -70.564241
## [121]  14.271400 -79.848611  27.994558 -59.602013 -59.584265 -42.833389
## [127] -44.315759 -60.881495   2.692984 -61.464900 -74.527473 -39.697500
## [133] -69.151289 -61.799900 -72.495531 -61.577783 -81.718098 -61.353560
## [139] -75.582339 -74.528160 -86.848397 -86.896032 -46.334801   2.439300
## [145] -38.533800 -61.217640  14.272312  14.272635  -0.120000   0.030000
## [151]   0.010000   0.040000   2.476800  -0.330000   3.207778  14.271800
## [157]   3.122341  34.428866  -0.040000 -23.766157 -78.726826  25.722100
## [163] -74.533155 -61.499668 -44.349789 -61.798440   2.541830 -86.867317
## [169] -75.616706 -82.847520 -82.847430 -82.831270 -82.831200 -82.813690
## [175] -82.813520 -82.918530 -82.918400 -82.911790 -82.911770 -82.903100
## [181] -82.902950 -82.876090 -82.774430 -82.774380 -82.876110 -82.783870
## [187] -82.796570 -82.796740 -82.806990 -82.784900 -82.785130 -82.784210
## [193] -82.796010 -82.774020 -82.790980 -82.774540 -82.795970 -82.790760
## [199] -82.795690 -82.912680 -82.869950 -82.869970 -82.912480 -82.915920
## [205] -82.915720 -82.794320 -82.865700 -82.865560 -82.794170 -82.855350
## [211] -82.855450 -82.858030 -82.857990 -82.795840 -82.859710 -82.835100
## [217] -82.859740 -82.835050 -82.788390 -82.788310 -82.911040 -82.911200
## [223] -82.878070 -82.877920 -82.885320 -82.885200 -82.873020 -82.899200
## [229] -82.909330 -82.916060 -82.894320 -82.894200 -82.899110 -82.823430
## [235] -82.834480 -82.834520 -82.845050 -82.823000 -82.844900 -82.907820
## [241] -82.907930 -82.888020 -82.871990 -82.871940 -82.887880 -82.774870
## [247] -82.817310 -82.817360 -82.826040 -82.825990 -82.791960 -82.792860
## [253] -82.906360 -82.906250 -82.965510 -82.965620 -82.938050 -82.938130
## [259] -82.937480 -82.937570 -82.929720 -82.929780 -82.928050 -82.927870
## [265] -82.927530 -82.927660 -82.886050 -82.885860 -82.897410 -82.897360
## [271] -82.900570 -82.900510 -82.897120 -82.897170 -82.968200 -82.872070
## [277] -82.872130 -82.888000 -82.888130 -82.854920 -82.855000 -82.810020
## [283] -82.810100 -82.805160 -82.805400 -82.804280 -82.804490 -82.812630
## [289] -82.804400 -82.801030 -82.801020 -82.961240 -82.957590 -82.961250
## [295] -82.852110 -82.851980 -82.932910 -82.932970 -82.886230 -82.897290
## [301] -82.897180 -82.886340 -82.854140 -82.854530 -82.854580 -82.854060
## [307] -82.961740 -82.961630 -82.928960 -82.899270 -82.899180 -82.929160
## [313] -82.904600 -82.849830 -82.884100 -82.884040 -82.849910 -82.902300
## [319] -82.929760 -82.902340 -82.929750 -82.902540 -82.902530 -82.865840
## [325] -82.865680 -82.859320 -82.859090 -82.910730 -82.910830 -82.901760
## [331] -82.901580 -82.966800 -82.966880 -82.966980 -82.967010 -82.907520
## [337] -82.907490 -82.965200 -82.965300 -82.969270 -82.969380 -82.966050
## [343] -83.077900 -82.906320 -82.906210 -83.077850 -83.101550 -83.101370
## [349] -83.101950 -83.101960 -83.103010 -83.103080 -83.031310 -83.031220
## [355] -83.077600 -83.077990 -83.079000 -83.077490 -83.077620 -83.079090
## [361] -83.084080 -83.084310 -83.066360 -83.066450 -83.085220 -83.085210
## [367] -83.062910 -83.062850 -83.095730 -83.095740 -83.066100 -83.060090
## [373] -83.065890 -83.060050 -83.031230 -83.030900 -83.030930 -82.985800
## [379] -82.985920 -83.017710 -83.017550 -83.031840 -83.031930 -83.032530
## [385] -83.043230 -83.032580 -83.043430 -83.011990 -83.012060 -83.050770
## [391] -83.050650 -82.982190 -82.982200 -83.017530 -83.034020 -83.033720
## [397] -82.994270 -82.984620 -82.994370 -82.984540 -83.005380 -83.005460
## [403] -82.983490 -82.983320 -82.985050 -83.013470 -83.018010 -83.011050
## [409] -83.010950 -82.990150 -82.996780 -83.005390 -82.827840 -82.791600
## [415] -82.858390 -82.930790 -82.932120 -82.883690 -82.901850 -83.101510
## [421] -83.101430 -83.060020 -82.993290 -83.007560 -83.032600 -83.014460
## [427] -83.093110 -83.095850 -83.095750 -82.985680 -83.002410 -83.013540
## [433] -82.984820 -82.990080 -82.996850 -82.841850 -82.907350 -82.866620
## [439] -82.826650 -82.932050 -82.883880 -82.906270 -83.074930
## 
## $issues
##  [1] ""                                           
##  [2] "cdround"                                    
##  [3] "cdround,cudc"                               
##  [4] "gass84"                                     
##  [5] "cum,gass84,fpwktinv"                        
##  [6] "gass84,osiic,diffown"                       
##  [7] "gass84,gdativ,colmafu,inmafu"               
##  [8] "cdreps"                                     
##  [9] "cdround,inmano"                             
## [10] "conti,gass84"                               
## [11] "cdround,osiic"                              
## [12] "cdround,depmms,typstativ,osiic,inmano"      
## [13] "cdround,gass84"                             
## [14] "cudc,gass84"                                
## [15] "cdround,depmms,elmms,typstativ,osiic,inmano"
## [16] "gass84,gdativ,refuriiv"                     
## 
## $waterBody
##  [1] NA                                          
##  [2] "Mar Caribe"                                
##  [3] "Gulf of Mexico"                            
##  [4] "Celebes Sea"                               
##  [5] "Mer Méditerranée"                          
##  [6] "Atlantic, Caribbean Sea, Prince Rupert Bay"
##  [7] "Caribbean"                                 
##  [8] "South Atlantic Ocean"                      
##  [9] "Mediterranean Sea"                         
## [10] "Mirabello bay"                             
## [11] "Mediterranean"                             
## 
## $basisOfRecord
## [1] "HUMAN_OBSERVATION"   "MATERIAL_SAMPLE"     "PRESERVED_SPECIMEN" 
## [4] "MACHINE_OBSERVATION"
## 
## $occurrenceStatus
## [1] "PRESENT" "ABSENT" 
## 
## $rightsHolder
##   [1] "John Christopher"                                                          
##   [2] "Eric Fischer Rempe"                                                        
##   [3] "Karim Haddad"                                                              
##   [4] "dtsapalas"                                                                 
##   [5] "claudia_mermelstein"                                                       
##   [6] "Andres Garcia"                                                             
##   [7] "Caroline Synakowski"                                                       
##   [8] "Luis Pérez Berrocal"                                                       
##   [9] "Julian Alzate"                                                             
##  [10] "Lucy Keith-Diagne"                                                         
##  [11] "marieta55"                                                                 
##  [12] "Valentin Moser"                                                            
##  [13] "Ana Castrillon"                                                            
##  [14] "ictio"                                                                     
##  [15] "Julien Renoult"                                                            
##  [16] "Sheldon Logan"                                                             
##  [17] "Frédéric ANDRE"                                                            
##  [18] "Falk Viczian Solarboot-Projekte gGmbH"                                     
##  [19] "Martin Coronel Varela"                                                     
##  [20] "George Manavopoulos"                                                       
##  [21] "Edgar Romeo"                                                               
##  [22] "ocean_explorers"                                                           
##  [23] "terence zahner"                                                            
##  [24] "Dino Biancolini"                                                           
##  [25] "daisymagoo"                                                                
##  [26] "Julius Stölzle"                                                            
##  [27] "marinakyprianou"                                                           
##  [28] "Jay"                                                                       
##  [29] "josepvilanova"                                                             
##  [30] "Paweł Pieluszyński"                                                        
##  [31] "James Telford"                                                             
##  [32] "portiadog"                                                                 
##  [33] "xaviervilport"                                                             
##  [34] "Павел Несмеянов"                                                           
##  [35] "tmenut"                                                                    
##  [36] "Giulia P"                                                                  
##  [37] "Paola Casale"                                                              
##  [38] "oversteegen"                                                               
##  [39] "Pauline Walsh Jacobson"                                                    
##  [40] "maltamama"                                                                 
##  [41] "Xavier Rufray"                                                             
##  [42] "Sylvain Le Bris"                                                           
##  [43] "forajido"                                                                  
##  [44] "chris2184"                                                                 
##  [45] "hunterefs"                                                                 
##  [46] NA                                                                          
##  [47] "Austen Novis"                                                              
##  [48] "Carlo R Sanchez"                                                           
##  [49] "Karl Questel"                                                              
##  [50] "Chris Taklis"                                                              
##  [51] "stacebird"                                                                 
##  [52] "Chiaramonte family"                                                        
##  [53] "Idlegrraphics"                                                             
##  [54] "George Allen"                                                              
##  [55] "Neil DeMaster"                                                             
##  [56] "frahome"                                                                   
##  [57] "Papageorgiou Nikolaos"                                                     
##  [58] "Bernat Garrigós"                                                           
##  [59] "Rachel Andres-Beck"                                                        
##  [60] "apedretti"                                                                 
##  [61] "mbrunetto"                                                                 
##  [62] "sandranap"                                                                 
##  [63] "Werner de Gier"                                                            
##  [64] "sflott"                                                                    
##  [65] "MBML-Peixes - Coleção de Peixes"                                           
##  [66] "renata lepage"                                                             
##  [67] "MoAm SAS | CORPAMAG"                                                       
##  [68] "Merav Vonshak"                                                             
##  [69] "vanilendil"                                                                
##  [70] "silvio_lioce1"                                                             
##  [71] "mothpup"                                                                   
##  [72] "Rahul Joshi"                                                               
##  [73] "Diveboard"                                                                 
##  [74] "Alejandro"                                                                 
##  [75] "Dicabu"                                                                    
##  [76] "Steve Kastner"                                                             
##  [77] "gianfrs"                                                                   
##  [78] "John C."                                                                   
##  [79] "Patricia Torres Pineda"                                                    
##  [80] "Francesco Cecere"                                                          
##  [81] "jbelback"                                                                  
##  [82] "Ana Carolina Hernández-Oquet"                                              
##  [83] "INVEMAR"                                                                   
##  [84] "Alphons"                                                                   
##  [85] "Curtis Irving"                                                             
##  [86] "João D'Andretta"                                                           
##  [87] "Erika Mitchell"                                                            
##  [88] "Antoni López-Arenas i Cama"                                                
##  [89] "maycl"                                                                     
##  [90] "Stichting Observation International"                                       
##  [91] "agardner"                                                                  
##  [92] "Claude Nozères"                                                            
##  [93] "Carlos Gonzalez"                                                           
##  [94] "Greg Lawrence"                                                             
##  [95] "Comisión Nacional para el Conocimiento y Uso de la Biodiversidad (CONABIO)"
##  [96] "bronyaur"                                                                  
##  [97] "heliastes21"                                                               
##  [98] "Anita Sprungk"                                                             
##  [99] "Mika Tomta"                                                                
## [100] "Carmelo López Abad"                                                        
## [101] "bensonk"                                                                   
## [102] "Alex Ward"                                                                 
## [103] "José Valério"                                                              
## [104] "gamesman02"                                                                
## 
## $datasetName
##  [1] "iNaturalist research-grade observations"                                                        
##  [2] NA                                                                                               
##  [3] "MBML-Peixes - Coleção de Peixes"                                                                
##  [4] "CORPAMAG-MOAM-CONTRATO260-2017-ARRECIFES-PASTOS-2018"                                           
##  [5] "Global Marine biodiversity data from Seawatchers Marine Citizen Science Platform 1980-2020"     
##  [6] "Diveboard - Scuba diving citizen science"                                                       
##  [7] "Biodiversidad íctica de la Isla Cayo Serranilla Expedición Seaflower 2017 Proyecto Colombia BIO"
##  [8] "Base de dados da Coleção Ictiológica do LEP-UFRRJ"                                              
##  [9] "NMNH Extant Biology"                                                                            
## [10] "NMNH Material Samples (USNM)"                                                                   
## [11] "Inventario ictiofaunístico de los humedales de Puerto Morelos, Quintana Roo"                    
## [12] "Colección de referencia de otolitos, Instituto de Ciencias del Mar-CSIC"                        
## [13] "Dry Tortugas Reef Visual Census 2014"                                                           
## 
## $recordedBy
##   [1] "John Christopher"                                                                       
##   [2] "Eric Fischer Rempe"                                                                     
##   [3] "Karim Haddad"                                                                           
##   [4] "dtsapalas"                                                                              
##   [5] "claudia_mermelstein"                                                                    
##   [6] "Andres Garcia"                                                                          
##   [7] "Caroline Synakowski"                                                                    
##   [8] "Luis Pérez Berrocal"                                                                    
##   [9] "Julian Alzate"                                                                          
##  [10] "Lucy Keith-Diagne"                                                                      
##  [11] "marieta55"                                                                              
##  [12] "Valentin Moser"                                                                         
##  [13] "Ana Castrillon"                                                                         
##  [14] "ictio"                                                                                  
##  [15] "Julien Renoult"                                                                         
##  [16] "Sheldon Logan"                                                                          
##  [17] "Frédéric ANDRE"                                                                         
##  [18] "Falk Viczian Solarboot-Projekte gGmbH"                                                  
##  [19] "Martin Coronel Varela"                                                                  
##  [20] "George Manavopoulos"                                                                    
##  [21] "Edgar Romeo"                                                                            
##  [22] "ocean_explorers"                                                                        
##  [23] "terence zahner"                                                                         
##  [24] "Dino Biancolini"                                                                        
##  [25] "daisymagoo"                                                                             
##  [26] "Julius Stölzle"                                                                         
##  [27] "marinakyprianou"                                                                        
##  [28] "Jay"                                                                                    
##  [29] "josepvilanova"                                                                          
##  [30] "Paweł Pieluszyński"                                                                     
##  [31] "James Telford"                                                                          
##  [32] "portiadog"                                                                              
##  [33] "xaviervilport"                                                                          
##  [34] "Павел Несмеянов"                                                                        
##  [35] "tmenut"                                                                                 
##  [36] "Giulia P"                                                                               
##  [37] "Paola Casale"                                                                           
##  [38] "oversteegen"                                                                            
##  [39] "Pauline Walsh Jacobson"                                                                 
##  [40] "maltamama"                                                                              
##  [41] "Xavier Rufray"                                                                          
##  [42] "Sylvain Le Bris"                                                                        
##  [43] "forajido"                                                                               
##  [44] "chris2184"                                                                              
##  [45] "hunterefs"                                                                              
##  [46] "Rafael Banon"                                                                           
##  [47] "Austen Novis"                                                                           
##  [48] "Carlo R Sanchez"                                                                        
##  [49] "Karl Questel"                                                                           
##  [50] "Chris Taklis"                                                                           
##  [51] "stacebird"                                                                              
##  [52] "Chiaramonte family"                                                                     
##  [53] "Idlegrraphics"                                                                          
##  [54] "George Allen"                                                                           
##  [55] "Neil DeMaster"                                                                          
##  [56] "frahome"                                                                                
##  [57] "Papageorgiou Nikolaos"                                                                  
##  [58] "Bernat Garrigós"                                                                        
##  [59] "Rachel Andres-Beck"                                                                     
##  [60] "apedretti"                                                                              
##  [61] "mbrunetto"                                                                              
##  [62] "sandranap"                                                                              
##  [63] "Benjamin Guichard (BioObs)"                                                             
##  [64] "Werner de Gier"                                                                         
##  [65] "sflott"                                                                                 
##  [66] "Orlando Bastião Surlogalli"                                                             
##  [67] "renata lepage"                                                                          
##  [68] NA                                                                                       
##  [69] "Merav Vonshak"                                                                          
##  [70] "vanilendil"                                                                             
##  [71] "silvio_lioce1"                                                                          
##  [72] "R/V Oregon II, NOAA"                                                                    
##  [73] "mothpup"                                                                                
##  [74] "Rahul Joshi"                                                                            
##  [75] "Thomas Chardon"                                                                         
##  [76] "Alejandro"                                                                              
##  [77] "Dicabu"                                                                                 
##  [78] "Steve Kastner"                                                                          
##  [79] "gianfrs"                                                                                
##  [80] "John C."                                                                                
##  [81] "Patricia Torres Pineda"                                                                 
##  [82] "Francesco Cecere"                                                                       
##  [83] "jbelback"                                                                               
##  [84] "Ana Carolina Hernández-Oquet"                                                           
##  [85] "Anonymous"                                                                              
##  [86] "Arturo Acero-Pizarro|Andrea Polanco-Fernández|José-Julian Tavera|Nacor Bolaños-Cubillos"
##  [87] "Alphons"                                                                                
##  [88] "Curtis Irving"                                                                          
##  [89] "T. P. Franco"                                                                           
##  [90] "João D'Andretta"                                                                        
##  [91] "Erika Mitchell"                                                                         
##  [92] "Antoni López-Arenas i Cama"                                                             
##  [93] "L. Weigt & T. Christiaan"                                                               
##  [94] "maycl"                                                                                  
##  [95] "Lauana S. Fadini"                                                                       
##  [96] "Jacksonian"                                                                             
##  [97] "agardner"                                                                               
##  [98] "Claude Nozères"                                                                         
##  [99] "Carlos Gonzalez"                                                                        
## [100] "Adrien Weckel (BioObs)"                                                                 
## [101] "Greg Lawrence"                                                                          
## [102] "ALMV; FMS; YLA; ODD"                                                                    
## [103] "bronyaur"                                                                               
## [104] "heliastes21"                                                                            
## [105] "Maguelone GRATEAU, Henri GRATEAU (Bio CODEP 26-07)"                                     
## [106] "jean-pierre, jean, corine (bio 26-07)"                                                  
## [107] "-927652254"                                                                             
## [108] "Anita Sprungk"                                                                          
## [109] "Institut Ecologia Litoral (IEL)"                                                        
## [110] "Mika Tomta"                                                                             
## [111] "Carmelo López Abad"                                                                     
## [112] "Pelagos"                                                                                
## [113] "bensonk"                                                                                
## [114] "Alex Ward"                                                                              
## [115] "José Valério"                                                                           
## [116] "Pascal RUEL (Aqua club Cruas)"                                                          
## [117] "Alfredo García de Vinuesa,  Pilar Sánchez"                                              
## [118] "MNLZ; ALMV; LFM; FMS; DZE"                                                              
## [119] "gamesman02"                                                                             
## 
## $depth
##   [1]     NA 46.000 15.000 28.000 10.565  6.710  0.000 21.500 40.000 10.000
##  [11]  3.000  6.555  6.000  1.000 11.800 11.600 19.200 18.800 17.700 17.000
##  [21] 11.000 10.900 10.700 11.200 10.100  9.000 14.000 24.700 25.000 14.200
##  [31] 19.700 13.600 16.000 15.100 16.200 19.400 17.400 27.600 14.900 27.000
##  [41] 17.900 13.900 18.500 13.300 14.800 16.600 15.300 12.700 12.200  7.800
##  [51] 11.900  7.900 12.100  6.300  6.900  5.800  6.700  6.100  8.100 11.500
##  [61]  8.500  5.100  6.400  9.800 10.600  9.400 16.300 24.900 13.700 13.000
##  [71] 16.800 16.500 20.600 23.500 15.900 22.500 21.700 14.100 25.500 25.900
##  [81]  8.700 23.700 18.600  9.500 25.800 14.300 16.700 17.300 17.600 16.900
##  [91] 18.200 19.100 19.500 21.800 12.400 23.100  5.500  5.700 20.000 15.700
## [101] 15.200 19.800 20.100  8.400  6.600 13.100 10.200  9.200 12.800 13.400
## [111] 21.400 22.900 23.200 21.300 22.100 23.400 20.400 18.000 17.100 18.300
## [121] 22.300 25.600 26.400 14.500 16.400 21.600 20.900 24.600 23.800 22.400
## [131] 22.700 21.200 22.800 15.400 18.100 28.200 26.200 21.000 10.300  7.200
## [141]  7.000 24.000 14.600 15.500 29.200 26.500 10.400
## 
## $locality
##  [1] NA                                       
##  [2] "Galician waters, FAO 27"                
##  [3] " Galician waters, FAO 27"               
##  [4] "Próximo a praia de Barra Nova."         
##  [5] "Próximo ao Rio Caravelas."              
##  [6] "Bahía de Gaira"                         
##  [7] "Bahía de Taganga"                       
##  [8] "Gulf of Mexico"                         
##  [9] "North Kampala Siusiu"                   
## [10] "Próximo a Rio Caravelas."               
## [11] "Mgarr Ix Xini"                          
## [12] "Isla Cayo Serranilla"                   
## [13] "lagoa de Maricá"                        
## [14] "Prince Rupert Bay, Portsmouth, Dominica"
## [15] "Regência."                              
## [16] "Curaçao"                                
## [17] "Bianca C"                               
## [18] "Límite punta Sur del Parque"            
## [19] "Maraldi"                                
## [20] "UTM25_33S_0425_3975"                    
## [21] "Illot de Benidorm"                      
## [22] "El Mascarat"                            
## [23] "Llomes de Reixes"                       
## [24] "Palestine - Gaza"                       
## [25] "El Albir"                               
## [26] "Nikolos Reef"                           
## [27] "Arenys de Mar"                          
## [28] "Frente a la CONANP"                     
## 
## $habitat
##  [1] NA                               "Formaciones Coralinas"         
##  [3] "Marine"                         "Pelágico"                      
##  [5] "Arenal"                         "Continuous Low Relief"         
##  [7] "Continuous Medium Relief"       "Spur and Groove High Relief"   
##  [9] "Isolated Low Relief (Patch)"    "Isolated Medium Relief (Patch)"
## [11] "Isolated HR Relief (Patch)"     "Spur and Groove low Relief"    
## [13] "Continuous High Relief"         "Rubble Low Relief"             
## 
## $year
## [1] 2022 2021 2020 2019 2018 2017 2016 2015 2014

A checagem dos dados pode ser ainda mais refinada. Existem várias possibilidades, mas a que exploraremos aqui é verificar se as coordenadas disponibilizadas são realmente válidas, pois muitas vezes essas coordenadas podem ser associadas a capitais, ou outros locais terra firme, mas a espécie que estamos a analisar é marinha! Para isso utilizamos os pacotes CoordinateCleaner e bcd, como você pode visualizar abaixo:

library(bdc)
library(CoordinateCleaner)
# checar coordenadas válidas
check_pf <- 
  bdc::bdc_coordinates_outOfRange(
    data = flying_gbif1,
    lat = "decimalLatitude",
    lon = "decimalLongitude")
## 
## bdc_coordinates_outOfRange:
## Flagged 0 records.
## One column was added to the database.
# checar coordenadas válidas e próximas a capitais (muitas vezes as coordenadas são erroneamente associadas a capitais dos países)

cl <- flying_gbif1 %>%
  select(acceptedScientificName, decimalLatitude, decimalLongitude) %>%
  rename(decimallongitude = decimalLongitude,
         decimallatitude = decimalLatitude,
         scientificName = acceptedScientificName) %>% 
  as_tibble() %>% 
  mutate(val = cc_val(., value = "flagged"),
         sea = cc_sea(., value = "flagged"),
         capital = cc_cap(., value = "flagged"))
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\letic\AppData\Local\Temp\RtmpEdLiZ8", layer: "ne_110m_land"
## with 127 features
## It has 3 fields
cl %>% 
  rename(decimalLongitude = decimallongitude,
         decimalLatitude = decimallatitude) %>% 
  bdc::bdc_quickmap(., col_to_map = "capital")  

cl %>% 
  rename(decimalLongitude = decimallongitude,
         decimalLatitude = decimallatitude) %>% 
  bdc::bdc_quickmap(., col_to_map = "sea") 

Os output que os pacote CoordinateCleaner e bcd retornaram indicam que as coordenadas são válidas, mas algumas ocorrências estão sinalizadas como próximas a capitas, essas não nos interessam. Assim, para excluírmos essas ocorrências que não nos interessam, em capitais e em terra firme, checaremos a distribuição do peixe-voador nas regiões oceonográficas ao filtramos a variável waterBody, variável essa que categoriza a região oceográfica das de distribuição das espécies. Ainda, podemos visualizar a distribuição das ocorrências do peixe-voador clivada por região com o pacote ggplot, um pacote que nos permite visualizar graficamente dados, como no comando, no output e no gráfico abaixo.

# investigar niveis suspeitos
flying_gbif1 %>% 
  distinct(waterBody) %>% 
  pull()
##  [1] NA                                          
##  [2] "Mar Caribe"                                
##  [3] "Gulf of Mexico"                            
##  [4] "Celebes Sea"                               
##  [5] "Mer Méditerranée"                          
##  [6] "Atlantic, Caribbean Sea, Prince Rupert Bay"
##  [7] "Caribbean"                                 
##  [8] "South Atlantic Ocean"                      
##  [9] "Mediterranean Sea"                         
## [10] "Mirabello bay"                             
## [11] "Mediterranean"
# waterBody
flying_gbif1 %>%
  group_by(waterBody) %>% 
  summarise(occ = length(scientificName)) %>% 
  ggplot(aes(occ, y=waterBody)) +
  geom_bar(stat = 'identity')

As ocorrências do peixe-voador se dão no Atlântico e no Mediterrâneo, logo, as ocorrências registradas no Mar de Celebes são suspeitas, pois Mar de Celebes localiza-se no Oceâno Pacífico. Então, vamos excluir essas ocorrências, mas manteremos as ocorrências categorizadas com NA nessa primeira análise.

# fonte das regioes erradas
flying_gbif1 %>% 
  filter(waterBody %in% c("Celebes Sea")) %>% 
  distinct(datasetName)
## # A tibble: 1 × 1
##   datasetName                             
##   <chr>                                   
## 1 Diveboard - Scuba diving citizen science

A tabela que nós geramos após a filtragem apresentou que as ocorrências registradas por mergulhadores que são de uma plantaforma específica de ciência cidadã. A maioria das ocorrências suspeitas estavam associadas a esse tipo de coleta, por isso excluiremos as ocorrências associdadas a ela.

# 6 ocorrencias
flying_gbif1 %>% 
  filter(datasetName %in% c("Diveboard - Scuba diving citizen science"))
## # A tibble: 6 × 15
##   scientificName        acceptedScienti… decimalLatitude decimalLongitude issues
##   <chr>                 <chr>                      <dbl>            <dbl> <chr> 
## 1 Dactylopterus volita… Dactylopterus v…            4.22            119.  cdreps
## 2 Dactylopterus volita… Dactylopterus v…           36.0              14.3 cdreps
## 3 Dactylopterus volita… Dactylopterus v…           12.0             -61.8 cdreps
## 4 Dactylopterus volita… Dactylopterus v…          -13.0             -38.5 cdreps
## 5 Dactylopterus volita… Dactylopterus v…           36.0              14.3 cdreps
## 6 Dactylopterus volita… Dactylopterus v…           35.2              25.7 cdreps
## # … with 10 more variables: waterBody <chr>, basisOfRecord <chr>,
## #   occurrenceStatus <chr>, rightsHolder <chr>, datasetName <chr>,
## #   recordedBy <chr>, depth <dbl>, locality <chr>, habitat <chr>, year <int>
# filtrar todas do dataset suspeito
flying_gbif_ok <- flying_gbif1 %>% 
  filter(!datasetName %in% c("Diveboard - Scuba diving citizen science"))
library(ggmap)
library(maps)
library(mapdata)

world <- map_data('world')
# checar pontos
# sem ocorrências no Mar de Celebes!
ggplot() +
  geom_polygon(data = world, aes(x = long, y = lat, group = group)) +
  coord_fixed() +
  theme_classic() +
  geom_point(data = flying_gbif_ok, aes(x = decimalLongitude, y = decimalLatitude), color = "red") +
  labs(x = "longitude", y = "latitude", title = expression(italic("Dactylopterus volitans")))

Outro critério para filtragem que vamos utilizar aqui é o da profundidade por região oceanográfica, pois o peixe-voador é encontrado em profundidades de no máximo 100 metros, como recifes, regiões estuarianas e em áreas arenosas.

# checar profundidade
flying_gbif_ok %>% 
  ggplot(aes(x = depth, fill = waterBody)) +
  geom_histogram()

OBIS

Mas não só de GBIF vivem os pesquisadores! Nós podemos fazer essa mesma limpeza de dados por meio OBIS, o outro repositório supracitado. Aqui utlizaremos o pacote robis e a função occurrence, que é similar ao occ_data do rgbif. Além disso, podemos idenficar os problemas por meio da coluna flag, como abaixo:

## OBIS
flying_obis <- robis::occurrence("Dactylopterus volitans")
# checar dados
names(flying_obis)
##   [1] "rightsHolder"                     "infraphylum"                     
##   [3] "country"                          "date_year"                       
##   [5] "scientificNameID"                 "scientificName"                  
##   [7] "individualCount"                  "dropped"                         
##   [9] "gigaclassid"                      "aphiaID"                         
##  [11] "decimalLatitude"                  "subclassid"                      
##  [13] "type"                             "gigaclass"                       
##  [15] "infraphylumid"                    "phylumid"                        
##  [17] "familyid"                         "catalogNumber"                   
##  [19] "occurrenceStatus"                 "basisOfRecord"                   
##  [21] "terrestrial"                      "id"                              
##  [23] "parvphylum"                       "order"                           
##  [25] "recordNumber"                     "dataset_id"                      
##  [27] "locality"                         "decimalLongitude"                
##  [29] "collectionCode"                   "date_end"                        
##  [31] "speciesid"                        "occurrenceID"                    
##  [33] "license"                          "date_start"                      
##  [35] "genus"                            "collectionID"                    
##  [37] "eventDate"                        "brackish"                        
##  [39] "coordinateUncertaintyInMeters"    "absence"                         
##  [41] "genusid"                          "originalScientificName"          
##  [43] "marine"                           "subphylumid"                     
##  [45] "institutionCode"                  "wrims"                           
##  [47] "date_mid"                         "class"                           
##  [49] "orderid"                          "kingdom"                         
##  [51] "classid"                          "phylum"                          
##  [53] "species"                          "subphylum"                       
##  [55] "subclass"                         "family"                          
##  [57] "kingdomid"                        "parvphylumid"                    
##  [59] "node_id"                          "flags"                           
##  [61] "sss"                              "shoredistance"                   
##  [63] "sst"                              "bathymetry"                      
##  [65] "scientificNameAuthorship"         "fieldNumber"                     
##  [67] "waterBody"                        "occurrenceRemarks"               
##  [69] "year"                             "day"                             
##  [71] "month"                            "recordedBy"                      
##  [73] "institutionID"                    "language"                        
##  [75] "modified"                         "datasetName"                     
##  [77] "specificEpithet"                  "coordinatePrecision"             
##  [79] "datasetID"                        "dynamicProperties"               
##  [81] "county"                           "maximumDepthInMeters"            
##  [83] "bibliographicCitation"            "continent"                       
##  [85] "samplingEffort"                   "minimumDepthInMeters"            
##  [87] "geodeticDatum"                    "depth"                           
##  [89] "stateProvince"                    "preparations"                    
##  [91] "footprintSRS"                     "samplingProtocol"                
##  [93] "parentEventID"                    "eventID"                         
##  [95] "eventRemarks"                     "footprintWKT"                    
##  [97] "associatedReferences"             "identifiedBy"                    
##  [99] "sampleSizeUnit"                   "island"                          
## [101] "organismQuantityType"             "taxonRank"                       
## [103] "countryCode"                      "georeferenceProtocol"            
## [105] "verbatimCoordinateSystem"         "sampleSizeValue"                 
## [107] "georeferenceRemarks"              "minimumElevationInMeters"        
## [109] "maximumElevationInMeters"         "eventTime"                       
## [111] "sex"                              "references"                      
## [113] "higherGeography"                  "verbatimEventDate"               
## [115] "higherClassification"             "vernacularName"                  
## [117] "dataGeneralizations"              "locationID"                      
## [119] "georeferenceSources"              "verbatimDepth"                   
## [121] "islandGroup"                      "taxonRemarks"                    
## [123] "habitat"                          "fieldNotes"                      
## [125] "verbatimLocality"                 "dateIdentified"                  
## [127] "ownerInstitutionCode"             "informationWithheld"             
## [129] "accessRights"                     "identificationQualifier"         
## [131] "taxonomicStatus"                  "locationRemarks"                 
## [133] "startDayOfYear"                   "locationAccordingTo"             
## [135] "organismQuantity"                 "georeferencedDate"               
## [137] "georeferencedBy"                  "georeferenceVerificationStatus"  
## [139] "acceptedNameUsage"                "lifeStage"                       
## [141] "verbatimLatitude"                 "higherGeographyID"               
## [143] "verbatimLongitude"                "nomenclaturalCode"               
## [145] "identificationVerificationStatus" "identificationRemarks"           
## [147] "verbatimSRS"                      "otherCatalogNumbers"             
## [149] "associatedMedia"                  "associatedSequences"
# check NA em datasetName
flying_obis1 <- flying_obis %>% 
  dplyr::select(scientificName, decimalLatitude, decimalLongitude, bathymetry,
                flags, waterBody, basisOfRecord, occurrenceStatus, rightsHolder, 
                datasetName, recordedBy, depth, locality, habitat) %>% 
  distinct()

# check problemas reportados (flags)
flying_obis1 %>% 
  distinct(flags)
## # A tibble: 8 × 1
##   flags                                   
##   <chr>                                   
## 1 NO_DEPTH                                
## 2 NO_DEPTH,ON_LAND                        
## 3 <NA>                                    
## 4 ON_LAND                                 
## 5 ON_LAND,NO_DEPTH                        
## 6 DEPTH_EXCEEDS_BATH                      
## 7 MIN_DEPTH_EXCEEDS_MAX,DEPTH_EXCEEDS_BATH
## 8 ON_LAND,DEPTH_EXCEEDS_BATH
# check NA em datasetName
flying_obis1 %>% 
  filter(!flags %in% c("no_depth,on_land", "on_land", "on_land,depth_exceeds_bath", "depth_exceeds_bath,on_land"),
         is.na(datasetName)) %>% 
  distinct(waterBody)
## # A tibble: 19 × 1
##    waterBody             
##    <chr>                 
##  1 <NA>                  
##  2 Africa                
##  3 South America         
##  4 Caribbean             
##  5 atlantique            
##  6 Atlantic              
##  7 South Atlantic        
##  8 North Atlantic Ocean  
##  9 Caribe                
## 10 Gulf of Mexico        
## 11 North America         
## 12 Europe                
## 13 Asia                  
## 14 Atlantic Ocean        
## 15 North America Atlantic
## 16 indien                
## 17 Central America       
## 18 Caribbean Sea         
## 19 pacifique
# depth ok
flying_obis1 %>% 
  filter(!flags %in% c("no_depth,on_land", "on_land", "on_land,depth_exceeds_bath", "depth_exceeds_bath,on_land"),
         !is.na(datasetName),
         !waterBody %in% c("Asia", "indien", "pacifique")) %>% 
  ggplot(aes(x = depth, fill = waterBody)) +
  geom_histogram() 

# checar niveis
flying_obis1 %>% 
  filter(!flags %in% c("no_depth,on_land", "on_land", "on_land,depth_exceeds_bath", "depth_exceeds_bath,on_land"),
         !is.na(datasetName),
         !waterBody %in% c("Asia", "indien", "pacifique")) %>% 
  lapply(., unique)
## $scientificName
## [1] "Dactylopterus volitans"
## 
## $decimalLatitude
##  [1]  46.000000  39.540000   6.883333  39.539000  39.670000  44.979852
##  [7]  39.536000  44.957500  39.820000  39.850000  39.529000  41.180000
## [13]  -3.850000  49.916666  44.750000  11.273350  37.570000  37.720000
## [19]  39.486000  11.219170  53.267000  39.440000  44.680000  46.833332
## [25]  10.954585  16.885000  43.494000   6.933333  11.833330  41.671800
## [31]   6.483333  33.730000  50.250000  43.716667  45.250000  39.230000
## [37]  53.267002 -22.950000  15.793972   2.950000  39.840000  40.110000
## [43]  50.210797 -23.433333  16.783300  43.049999  40.504333  42.766666
## [49]  43.050000  50.047152  19.800000  43.680000  18.708000  39.310000
## [55] -23.000000  45.020000  49.973620  37.550000  11.207970  42.116665
## [61] -27.500000
## 
## $decimalLongitude
##  [1] -42.025000   2.330000 -55.433333   2.592700   2.450000 -65.806016
##  [7]   2.376600 -45.018900   4.180000   2.740000   2.439300   1.790000
## [13] -33.816667  -5.559220 -65.516700 -74.206970  -0.730000  -0.640000
## [19]   2.477000 -74.240580  -9.045296   3.330000 -63.130000 -53.750000
## [25] -64.145604 -25.000000 -65.468400 -54.983333   2.476800 -66.750000
## [31]   2.802900 -56.200000 -76.900000  -6.333333 -60.500000 -57.800000
## [37]   2.980000  -9.045303 -42.166667 -79.848611 -49.066667   4.090000
## [43]   3.690000 -60.529810 -44.200000 -88.066700 -64.033300 -73.629000
## [49] -64.666700  -5.405545 -91.333300 -65.330000 -91.570000   3.260000
## [55] -42.000000 -65.894400  -6.264514  -0.650000 -74.239700 -65.566700
## [61] -48.500000
## 
## $bathymetry
##  [1] 4687.0   82.4   49.0    4.0   64.0   84.0    1.0 4438.0   69.6   63.0
## [11]   23.0   25.2   11.0   68.4   -3.0  -26.0   57.4   43.8   12.0  -32.0
## [21]   -6.0   51.4    9.0  100.0   -1.0   -2.0   94.0    3.0   11.8   31.0
## [31]   41.0   86.0   56.0   92.0   58.4   28.0    2.0   72.0   74.0  192.0
## [41]    8.0   55.0   50.0   90.0   16.0  110.0   33.0   25.0   -9.0   93.0
## [51]  -97.0   87.0   49.4  108.0   22.0  310.0  -87.0
## 
## $flags
## [1] "NO_DEPTH"         NA                 "ON_LAND"          "ON_LAND,NO_DEPTH"
## 
## $waterBody
## [1] NA                   "Bay of Fundy"       "West Atlantic"     
## [4] "Mar Caribe"         "Atlantic Ocean"     "Northwest Atlantic"
## [7] "Terminos Lagoon"   
## 
## $basisOfRecord
## [1] "PreservedSpecimen"      "Occurrence"             "MachineObservation"    
## [4] "NomenclaturalChecklist" "HumanObservation"       "MaterialSample"        
## 
## $occurrenceStatus
## [1] NA         "present"  "Present"  "Presente"
## 
## $rightsHolder
##  [1] "The Huntsman Marine Science Centre and Fisheries & Oceans Canada"                                  
##  [2] NA                                                                                                  
##  [3] "Museu Nacional, Universidade Federal do Rio de Janeiro"                                            
##  [4] "MoAm SAS | CORPAMAG"                                                                               
##  [5] "Nova Scotia Museum"                                                                                
##  [6] "O'Connor, Shannon Elizabeth"                                                                       
##  [7] "Her Majesty the Queen in right of Canada, as represented by the Minister of Fisheries and Oceans'" 
##  [8] "INVEMAR"                                                                                           
##  [9] "'Her Majesty the Queen in right of Canada, as represented by the Minister of Fisheries and Oceans'"
## [10] "Canadian Museum of Nature"                                                                         
## 
## $datasetName
##  [1] "Atlantic Reference Centre"                                                                                                                                                
##  [2] "MEDITS-Spain: Demersal and mega-benthic species from the MEDITS (Mediterranean International Trawl Survey) project on the Spanish continental shelf between 1994 and 2010"
##  [3] "Fish collection of National Museum of Nature and Science"                                                                                                                 
##  [4] "Global Marine biodiversity data from Seawatchers Marine Citizen Science Platform 1980-2020"                                                                               
##  [5] "Bay of Fundy Species List"                                                                                                                                                
##  [6] "Coleção Ictiológica (MNRJ), Museu Nacional(MN), Universidade Federal do Rio de Janeiro(UFRJ)"                                                                             
##  [7] "1743-2010 National Marine Aquarium (NMA) United Kingdom Marine Fish Recording Scheme"                                                                                     
##  [8] "CORPAMAG-MOAM-CONTRATO260-2017-ARRECIFES-PASTOS-2018"                                                                                                                     
##  [9] "Marine Recorder Snapshot extract of surveys entered by National Museums Northern Ireland (NMNI)"                                                                          
## [10] "Nova Scotia Museum - Marine Birds, Mammals, and Fishes"                                                                                                                   
## [11] "Ictiofauna_de_Lagunas_Costeras_Margarita"                                                                                                                                 
## [12] "Discovery Expedition Biological Reports"                                                                                                                                  
## [13] "Acadia University: Juvenile Fish Assemblages collected in bays along the Atlantic Coast of mainland Nova Scotia during summers of 2005 and 2006"                          
## [14] "MMM_ALR_FISH"                                                                                                                                                             
## [15] "ECNASAP - East Coast North America Strategic Assessment"                                                                                                                  
## [16] "Biodiversidad íctica de la Isla Cayo Serranilla Expedición Seaflower 2017 Proyecto Colombia BIO"                                                                          
## [17] "Northern Gulf of St. Lawrence Fishes"                                                                                                                                     
## [18] "University of Kansas Biodiversity Institute Fish Tissue Collection"                                                                                                       
## [19] "VIMS NorthEast Area Monitoring and Assessment Program"                                                                                                                    
## [20] "Fish"                                                                                                                                                                     
## [21] "University of Kansas Biodiversity Institute Fish Voucher Collection"                                                                                                      
## [22] "Terminos Lagoon Fish Occurrence"                                                                                                                                          
## [23] "Marine Recorder Snapshot extract of surveys entered by SeaSearch"                                                                                                         
## 
## $recordedBy
##  [1] NA                                                                                             
##  [2] "G.W.Nunan|D.F.Moraes Jr."                                                                     
##  [3] "Contact: info@nmni.com"                                                                       
##  [4] "Pablo_Ramírez_Villarroel"                                                                     
##  [5] "O'Connor, Shannon Elizabeth. 2008"                                                            
##  [6] "G.W.Nunan"                                                                                    
##  [7] "Arturo Acero-Pizarro | Andrea Polanco-Fernández | José-Julian Tavera | Nacor Bolaños-Cubillos"
##  [8] "J.A.Oliveira"                                                                                 
##  [9] "AMIK-2008"                                                                                    
## [10] "Navio Toko Maru"                                                                              
## [11] "Brothers, Edward B; Johnson, David S"                                                         
## [12] "[irn: 11130]"                                                                                 
## [13] "R/V Oregon II"                                                                                
## [14] "G.W.Nunan|Décio F.M.Junior"                                                                   
## [15] "Contact: info@seasearch.org.uk"                                                               
## [16] "G.W.Nunan|D.F.Moraes Jr.|W. Bandeira"                                                         
## 
## $depth
##  [1]     NA  83.20  49.00  15.00  62.00  11.00  58.00  63.40  10.00  30.00
## [11]   0.00  45.00  47.75  53.20   6.00   9.00  34.00  57.60  64.80 171.57
## [21]  27.00  50.00  26.00  92.14   7.00  52.40   9.50 101.00 165.00
## 
## $locality
##  [1] "unspecified"                                                                                                          
##  [2] NA                                                                                                                     
##  [3] "Bay of Fundy"                                                                                                         
##  [4] "groove on the seaward margin of the reef, off the southern extremity of Ilha do Farol, Atol das Rocas, mun. Natal, RN"
##  [5] "\"old fishway\", annapolis tidal generating station, annapolis royal, EBB tide"                                       
##  [6] "Bahía de Taganga"                                                                                                     
##  [7] "Bahía de Gaira"                                                                                                       
##  [8] "Harricott Beach, St. Mary's Bay"                                                                                      
##  [9] "Laguna de Boca de Palo"                                                                                               
## [10] "St Vincent, Cape Verde Islands"                                                                                       
## [11] "Port La Tour"                                                                                                         
## [12] "Archipiélago Los Roques"                                                                                              
## [13] "Stratum.54"                                                                                                           
## [14] "Prainha, Arraial do Cabo, RJ"                                                                                         
## [15] "Isla Cayo Serranilla"                                                                                                 
## [16] "Estação 1999, foz do Amazonas (entre Salinópolis e Cabo Orange), AP"                                                  
## [17] "Carrie Bow Cay, plankton net off dock"                                                                                
## [18] "04"                                                                                                                   
## [19] "500 km SE of Sable Island (NS)"                                                                                       
## [20] "Gulf of Mexico: Campeche Banks Oregon Sta. 440-445"                                                                   
## [21] "Lado sudeste da Ilha de Cabo Frio"                                                                                    
## [22] "ilha de Santa Catarina, SC"                                                                                           
## 
## $habitat
## [1] NA                      "Marine"                "Formaciones Coralinas"
# ok
flying_obis_ok <- flying_obis1 %>% 
  filter(!flags %in% c("no_depth,on_land", "on_land", "on_land,depth_exceeds_bath", "depth_exceeds_bath,on_land"),
         !is.na(datasetName),
         !waterBody %in% c("Asia", "indien", "pacifique", NA)) 
# check
ggplot() +
  geom_polygon(data = world, aes(x = long, y = lat, group = group)) +
  coord_fixed() +
  theme_classic() +
  geom_point(data = flying_obis_ok, aes(x = decimalLongitude, y = decimalLatitude, color = waterBody)) +
  labs(x = "longitude", y = "latitude", title = expression(italic("Dactylopterus volitans")))

Para finalizar a limpeza dos dados uniremos as ocorrências do GBIF e do OBIS em um único data frame e checaremos duplicatas ou outros problemas.

# unir GBIF e OBIS

# ver diferencas
setdiff(names(flying_gbif_ok), names(flying_obis_ok))
## [1] "acceptedScientificName" "issues"                 "year"
setdiff(names(flying_obis_ok), names(flying_gbif_ok))
## [1] "bathymetry" "flags"
all_data <- bind_rows(flying_gbif_ok %>% 
                        mutate(repo = paste0("gbif", row.names(.))), 
                      flying_obis_ok %>% 
                        mutate(repo = paste0("obis", row.names(.)))) %>%
  column_to_rownames("repo") %>% 
  dplyr::select(decimalLongitude, decimalLatitude, depth, year, habitat) %>% 
  distinct() %>% 
  rownames_to_column("occ") %>% 
  separate(col = "occ", into = c("datasetName", "rn"), sep = 4) %>%
  mutate(scientificName = "Dactylopterus volitans") %>% 
  dplyr::select(-rn)


# mapear ocorrencias
ggplot() +
  geom_polygon(data = world, aes(x = long, y = lat, group = group)) +
  coord_fixed() +
  theme_classic() +
  geom_point(data = all_data, aes(x = decimalLongitude, y = decimalLatitude, color = datasetName)) +
  #theme(legend.title = element_blank()) +
  labs(x = "longitude", y = "latitude", title = expression(italic("Dactylopterus volitans")))

write.csv(all_data, "C:/Users/letic/Documents/Pasta de atividades/Mestrado/Ciência colaborativa/occ_GBIF-OBIS_dac_voli.csv", row.names = FALSE)

Análises exploratórias

Vamos explorar nossos dados filtrados!
Uma pergunta que podemos fazer é: Quais são os anos com maior frequência de ocorrência do peixe-voador?

A maior frequência de ocorrências foi no ano de 2014, enquanto que nos anos seguintes a frequência é menor, como podemos observar no histograma abaixo, gerado pela função hist.

hist(all_data$year, xlab = "Anos", ylab = "Frequência", main = "Histograma das ocorrências de Dactylopterus volitans")

Uma outra pergunta que podemos fazer é: Qual é o habitat com maior ocorrência de peixe-voador?

Para respondermos essa questão iremos agrupar as ocorrências por habitat de acordo com a latitude. Para isso utilizaremos os pacotes plotly e a função ggplotly, ambos permitem a criação e visualização de gráficos interativos. No gráfico interativo que plotamos podemos selecionar ou ocultar variáveis e encontrar informações mais específicas apenas passando o cursor do mouse por cima de cada habitat. No gráfico podemos observar que a maioria das ocorrências estão agrupadas no habitat Continouns Midium Relief. Por fim a maioria dos habitats resgistrados nesse data frame está localizado a uma latitude de 25º, enquanto apenas o habitat Arenal está a -25º, o que sugere que maioria dos registros do peixe-voador foi feita no norte global.

library(plotly)
cc <- flying_gbif1 %>% 
  mutate(lat = round(decimalLatitude)) %>% 
  group_by(lat, habitat) %>%
  summarise(occ = length(habitat)) %>%
  ggplot(aes(y = occ, x = lat, color = habitat)) +
    geom_point() +
    theme_classic(base_size = 15) +
    labs(x = "Latitude", y = 'Ocorrência')
ggplotly(cc)

Considerações finais

Agora temos em mãos um data frame limpo, com dados limpos, alguns gráficos e muitas perguntas ainda não feitas. Aprendemos que as ocorrências do peixe-vaodor concentou-se no ano de 2014 e no habitat Continouns Midium Relief. Descobrimos isso por meio dos dados oriundos do GBIF e do OBIS, que limpamos e começamos a analisar com alguns pacotes e algumas funções e tal qual o multiplicidade de perguntas também são os meios e técnicas para respondê-las.